-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C3ID implementation #21
Conversation
This ports C3ID computation (that we will need soon) from https://github.com/microsoft/security-utilities Changes: - Company name is a parameter and not hard-coded to Microsoft - Optimized like the rest of Cask to avoid allocations and write the result to a span Tests added here were also run against the implementation in security-utilities to check equivalence. Also: fix edge cases with empty span input in encoding polyfills.
#22 should fix the build issue. Let's get that reviewed and merged first. |
[InlineData("😁", "f/BTV0j6A8km4KDw7aJz")] | ||
public void Test_Basic(string text, string expected) | ||
{ | ||
string actual = ComputeC3IDBase64(company: "Microsoft", text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey is the proper rendering C3ID
or C3id
?
Haha. It is a true fact that I treat this as a valid question despite the obvious triviality of the concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer C3id
but need I mention I have no desire to die on this hill? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tehe. I think I prefer C3Id
or C3ID
over C3id
. Id
is a word so if the convention in C#
is CreateUrl
, C3Id
would be the most consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not appear in any identifier other than in tests. As such, my care factor is vanishingly small. I'm just as picky about these things in public API, so no judgment. 😁 I'll happily change it, but I still don't know which to use...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with C3Id
now. I need it in non-test code in my next change. Still not public, but I think it's worth settling this right now and not having noise of renaming it mixed in with other changes.
My reason for C3Id
over C3id
is a nod to my much younger self that rewrote the FxCop casing tokenizer ~20 years ago. That would parse this as two tokens C3 and id. And only C3Id
respects the guidelines with that tokenization. 😁
I'd ask to get rid of the number in the middle of an acronym, but I think I've caused enough trouble asking for algo changes.
- Use base64-decoded prefix for hash input and base64-encode input - Throw on empty or whitespace
Prefix.CopyTo(input); | ||
SHA256.HashData(textUtf8, input[Prefix.Length..]); | ||
|
||
// Perform second hash, truncate, and copy to destination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting concerned about the cost here. Two rounds of SHA256 is a lot of compute for a 12 byte hash.
For context from offline conversation: the rationale (as I understand it, but I may not) for two rounds in C3ID is to make it possible to compute the C3ID from the plain SHA256. Without sharing the key and without adding C3ID to their toolkit, someone can send you the SHA256 of a key, and then you can convert that to C3ID.
We should maybe reconsider if this scenario is worth the cost added to CASK scenarios. I'll provide some data on that cost in a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without C3ID in hash scenarios
From #18 (comment)
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
CompareHash_Cask | 451.609 ns | 3.9292 ns | 3.6753 ns | - | - |
CompareHash_Floor | 294.875 ns | 2.7652 ns | 2.5866 ns | - | - |
GenerateHash_Cask | 300.978 ns | 1.3669 ns | 1.2786 ns | 0.0091 | 176 B |
GenerateHash_Floor | 225.569 ns | 1.6476 ns | 1.2864 ns | 0.0057 | 112 B |
With C3ID in hash scenarios
From my work-in-progress branch:
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
CompareHash_Cask | 856.6 ns | 6.06 ns | 5.37 ns | - | - |
CompareHash_Floor | 298.5 ns | 3.69 ns | 3.27 ns | - | - |
GenerateHash_Cask | 725.1 ns | 5.20 ns | 4.86 ns | 0.0114 | 216 B |
GenerateHash_Floor | 226.3 ns | 0.78 ns | 0.69 ns | 0.0057 | 112 B |
There are some other changes in this branch so this doesn't quite isolate the C3ID cost. However, I also measured on this machine that a single SHA256 of Cask key sized data -- without allocation, copying, or encoding conversion -- takes about 120ns on this machine. That's a lot relative to the "floor" scenarios we're aiming to replace.
Up to now, I was confident in being able to tell a good story about how the extra Cask costs were a fair price to pay for Cask features, but I am not sure C3ID is living up to that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that said, I think it would be good to take this change and discuss options to improve this in follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack on this and let's talk more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implements C3ID computation that we will need soon.
The implementation differs from the one in https://github.com/microsoft/security-utilities and this seeks to become the new standard.
Changes:
Tests include a naive, unoptimized implementation that is easier to understand, and we check that both implementations match. We also check that we produce the same deterministic value for a sampling of hard-coded test input.
This change also fixes edge cases with empty span input in encoding polyfills.